Configuration-less network locking infrastructure for shared file systems

ABSTRACT

A network-based method for managing locks in a shared file system (SFS) for a group of hosts that does not require any configuration to identify a server for managing locks for the SFS. Each host in the group carries out the steps of checking a predetermined storage location to determine whether there is a host ID written in the predetermined location. If there is no host ID written in the predetermined location, the first host to notice this condition writes its host ID in the predetermined location to identify itself as the server for managing locks. If there is a host ID written in the predetermined location, the host ID of the server for managing locks is maintained in local memory. When the host needs to perform IO operations on a file of the SFS, it communicates with the server for managing locks over the network using the host ID of the server for managing locks to obtain a lock to the file.

CROSS-REFERENCE

This application is a continuation of U.S. patent application Ser. No.13/037,808, entitled “CONFIGURATION-LESS NETWORK LOCKING INFRASTRUCTUREFOR SHARED FILE SYSTEMS,” filed Mar. 1, 2011, having Attorney Docket No.A508, which is incorporated herein by reference.

BACKGROUND

In a shared file system (SFS), a lock mechanism is employed to manageconcurrent accesses to files from more than one server. Early SFSsrelied on SCSI-2 reserve/release primitive commands to provide a serverexclusive access to a logical storage volume that stored files ofinterest. Unfortunately, SCSI-2 reserve primitives are expensive as theylock the entire logical storage volume and input/output (IO) operationson the logical storage volume are not permitted so long as the SCSI-2reserve primitive is in effect.

Virtual Machine File System (VMFS) is a proprietary SFS developed byVMware, Inc. of Palo Alto, Calif. VMFS introduces the notion of a disklock that protects specific resources of the VMFS, e.g., files, bitmaps,etc. Rather than locking the entire logical storage volume using theSCSI-2 reserve primitive, a server can simply acquire a lock associatedwith the resource to which an IO operation needs to be performed. Thissignificantly reduces the overall duration of a SCSI-2 reserve/release,as a SCSI-2 release may be issued immediately after a lock protecting aresource is updated as “locked.” However, the scaling of this lockingscheme remains a challenge.

SUMMARY

One or more embodiments of the present invention provide a network-basedmethod for managing locks in an SFS. One feature of the network-basedmethod according to embodiments of the present invention is that it canidentify a server for managing locks without any configuration.

A method of managing locks in a shared file system (SFS) for a group ofhosts, according to an embodiment of the present invention, includes thesteps of writing a host ID in a predetermined location to identify thehost that is acting as a server for managing locks, and communicatingwith said server for managing locks over a network to obtain locks tofiles of the SFS. Any of the hosts in the group can serve as said serverfor managing locks and once the host ID of said server for managinglocks is written in the predetermined location, all other hosts in thegroup communicate with said server for managing locks to obtain locks tofiles of the SFS.

According to another embodiment of the present invention, each host inthe group carries out the steps of checking a predetermined location tosee whether or not there is a host ID written in the predeterminedlocation. If there is no host ID written in the predetermined location,the first host to notice this condition writes its host ID in thepredetermined location to identify itself as the server for managinglocks. If there is a host ID written in the predetermined location, thehost ID of said server for managing locks is maintained in local memory.When the host needs to perform IO operations on files of the SFS, itcommunicates with said server for managing locks over the network usingthe host ID of said server for managing locks stored in local memory.

According to a further embodiment of the present invention, in the eventthat one of the other hosts determines that its communication with aserver for managing locks over the network has failed, the host posts amessage in a data structure owned by said server for managing locks toemploy an alternative locking technique that does not rely oncommunications with said server for managing locks over the network. Thesaid server for managing locks, in response to seeing the message in thedata structure that it owns, posts a confirmation message to confirm useof the alternative locking technique, whereupon each of the hostscommunicates with the SFS using the alternative locking technique. Uponsuccessful reconnection with the original said server for managing locksover the network, each host in the group reverts back to the techniquethat relies on communications with said server for managing locks overthe network.

Further embodiments of the present invention provide a non-transientcomputer readable storage medium that includes instructions for causinga computer system to carry out one or more of the methods set forthabove.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a computer system configuration utilizing a sharedfile system in which one or more embodiments of the present inventionmay be implemented.

FIG. 2 illustrates a virtual machine based system in which one or moreembodiments of the present invention may be implemented.

FIG. 3 illustrates a configuration for locking files to enable multipleservers to access a data storage unit concurrently, according to one ormore embodiments of the present invention.

FIG. 4 is a flow diagram of method steps for designating a server as alock server when no lock server is designated, according to one or moreembodiments of the present invention.

FIGS. 5A-5B is a flow diagram of method steps for performing an JOoperation to a file when a lock server has been designated, according toone or more embodiments of the present invention.

FIG. 6 is a flow diagram of method steps carried out by a lock server toissue locks to servers that submit lock requests, according to one ormore embodiments of the present invention.

FIG. 7 is a flow diagram of method steps for determining and respondingto a lock server failure or a network partition, according to one ormore embodiments of the present invention.

DETAILED DESCRIPTION

FIG. 1 illustrates a computer system configuration utilizing an SFS,also known as a cluster file system, in which one or more embodiments ofthe present invention may be implemented. The computer systemconfiguration of FIG. 1 includes multiple servers 100 _(A) to 100 _(N),each of which is connected to storage area network (SAN) 105 andnetworked to one another through local area network (LAN) 103. Operatingsystems 110 _(A) and 110 _(E) on servers 100 _(A) and 100 _(B) interactwith an SFS 115 that resides on a data storage unit (DSU) 120 accessiblethrough SAN 105. In particular, DSU 120 is a logical unit of one or moredata storage systems 125 ₁ to 125 _(M) (e.g., disk array) connected toSAN 105. While DSU 120 is exposed to operating systems 110 _(A) and 110_(E) by storage system manager 130 (e.g., disk controller) as acontiguous logical storage space, the actual physical data blocks uponwhich SFS 115 may be stored is dispersed across the various physicaldisk drives 135 _(X) to 135 _(Z) of data storage system.

Data in DSU 120 (and possibly other DSUs exposed by the data storagesystems) is accessed and stored in accordance with structures andconventions imposed by an SFS 115 that stores such data as a pluralityof files of various types, typically organized into one or moredirectories. SFS 115 further includes file system management or metadatastructures that store information, for example, about how data is storedwithin SFS 115, such as block bitmaps that indicate which data blocks inSFS 115 remain available for use, along with other metadata structuressuch as file descriptors or inodes for directories and files in SFS 115.In one embodiment, each of servers 100 is configured with a hypervisorto support the execution of virtual machines each having a virtual diskrepresented by a file within SFS 115. One example of SFS 115 is VMFS(Virtual Machine File System), which is an SFS for supporting virtualmachines available from VMware, Inc. of Palo Alto, Calif.

FIG. 2 illustrates a virtual machine based system 200 in which one ormore embodiments of the present invention may be implemented. Anintegral computer system 201, generally corresponding to one of thecomputer system servers 100, is constructed on a conventional, typicallyserver-class hardware platform 224, including in particular host busadapters (HBAs) 226 in addition to conventional platform processor,memory, and other standard peripheral components (not separately shown).The hardware platform 224 executes a hypervisor 214 supporting a virtualmachine execution space 202 within which virtual machines (VMs) 203_(A-N) are executed. In one or more embodiments of the presentinvention, the hypervisor 214 and virtual machines 203 _(A-N) areimplemented using the vSphere™ product developed and distributed byVMware, Inc.

In summary, the hypervisor 214 provides the necessary services andsupport to enable concurrent execution of the virtual machines 203_(A-N). In turn, each virtual machine 203 _(A-N) implements a virtualhardware platform 210 as a conceptual layer that supports the executionof a guest operating system 208 and one or more client applicationprograms 206. In one or more embodiments of the present invention, theguest operating systems 208 are instances of Microsoft Windows, Linuxand Netware-based operating systems, or the like. Other guest operatingsystems can be equivalently used. In each instance, the guest operatingsystem 208 includes a native file system layer, typically either an NTFSor ext3FS type file system layer. These file system layers interfacewith the virtual hardware platforms 210 to access, from the perspectiveof the guest operating systems 208, a data storage host bus adapter. Inone implementation, the virtual hardware platform 210 implements avirtual host bus adapter 212 that emulates the necessary system hardwaresupport to enable execution of the guest operating system 208transparently to the virtualization of the system hardware.

File system calls initiated by the guest operating systems 208 toperform file system-related data transfer and control operations areprocessed and passed through the virtual host bus adapter 212 to adjunctvirtual machine monitor (VMM) layers 204 _(A-N) that implement thevirtual system support necessary to coordinate operation with thehypervisor 214. In particular, a host bus emulator 213 functionallyenables the data transfer and control operations to be ultimately passedto the host bus adapters 226. File system calls for performing datatransfer and control operations generated, for example, by applications206 are translated and passed to a virtual machine file system (VMFS)driver or component 216 that manages access to files (e.g., virtualdisks, etc.) stored in data storage systems (such as data storage system125) that may be accessed by any of the virtual machines 203 _(A-N). Inone embodiment, access to DSU 120 is managed by VMFS driver 216 and SFS115 for DSU 120 is a virtual machine file system (VMFS) that representsthe organization of files and directories stored in DSU 120, inaccordance with structures understood by VMFS driver 216. For example,guest operating systems 208 receive file system calls and performcorresponding command and data transfer operations against virtualdisks, such as virtual SCSI devices accessible through virtual HBA 212,that are visible to the guest operating systems 208. These virtual disksare maintained as files stored on VMFS, for example, in DSU 120. Eachsuch virtual disk may be maintained as a file or set of files stored onVMFS, for example, in DSU 120. Guest operating system 208 file systemcalls are translated from instructions applicable to a virtual diskvisible to the guest operating systems 208 to instructions applicable toa file representing the virtual disk in DSU 120 exposed by data storagesystem 125 to VMFS. Such translations are performed through a number ofcomponent layers of an “IO stack,” beginning at the guest operatingsystem 208 (which receives the file system calls from applications 206),through virtual HBA 212, host bus emulator 213, VMFS driver 216, alogical volume manager 218 which assists VMFS driver 216 with mappingfiles stored in VMFS with DSU 120 exposed by data storage systemsnetworked through SAN 105, a data access layer 222, including devicedrivers, and host bus adapters 226 (which, e.g., issues a SCSI commandto data storage system 125 to access DSU 120).

It should be recognized that the various terms, layers andcategorizations used to describe the virtualization components in FIG. 2may be referred to differently without departing from theirfunctionality or the spirit or scope of the invention. For example, VMMs204 _(A-N) may be considered separate virtualization components betweenVMs 203 _(A-N) and hypervisor 214 (which, in such a conception, mayitself be considered a virtualization “kernel” component) since thereexists a separate VMM for each instantiated VM. Alternatively, each VMMmay be considered to be a component of its corresponding virtual machinesince such VMM includes the hardware emulation components for thevirtual machine. In such an alternative conception, for example, theconceptual layer described as virtual hardware platform 210 may bemerged with and into VMMs 204 _(A-N) such that virtual host bus adapter212 is removed from FIG. 2 (i.e., since its functionality is effectuatedby host bus adapter emulator 213).

Turning now to FIG. 3, details of DSU 120 and SFS 115 are shown. Asshown, SFS 115 includes a master lock 302 for SFS 115 in which anidentity of a lock server according to one or more embodiments of thepresent invention is stored. Similar to other locks describedherein—which protect inodes, bitmaps, etc.—master lock 302 preventssimultaneous modification of SFS 115, thereby preventing multiple lockservers from being elected. In one embodiment, master lock 302 includesan owner data field 304, a heartbeat data field 306, and a mailbox 312.Owner data field 304 includes an IP address of a server that isdesignated as the lock server. Owner data field 304 may contain a zeroor some other special value to indicate that no server currently ownsmaster lock 302. Heartbeat data field 306 is referenced by the lockserver and is used to indicate whether or not the lock server is activeor inactive. Locks that are managed using heartbeats maintained in aheartbeat region are detailed in U.S. patent application Ser. No.11/676,109, which is incorporated by reference herein.

Mailbox 312 may be used to allow each of servers 100 to communicate withthe lock server when any of servers 100 experiences a network partitionover LAN 103. Specifically, there is only one mailbox 312 per filesystem and the lock server has ownership of the one mailbox 312. In oneembodiment, mailbox 312 is implemented using a data structure thatincludes a plurality of rows, where each row stores a time value, asubject, and a message. Servers 100 may read and/or write messages tothe mailbox 312 to communicate information to the designated lock serverwhen they experience a network partition over LAN 103.

SFS 115 also includes inodes 350 for files and directories stored in SFS115. Each inode 350 includes metadata 360 that may include, among otherthings, block information that identifies data blocks to which inode 350refers, such as physical blocks within SFS 115 that store data for afile or directory. In addition to metadata 360, each inode 350 isassociated with a lock 352. Lock 352 governs access to the underlyingdata of a file or directory associated with inode 350. Lock 352comprises an owner data field 354 and a heartbeat field 356. Owner datafield 354 contains the same type of information as owner data field 304,and heartbeat data field 356 contains the same type of information asheartbeat data field 306.

One or more embodiments of the present invention provide afine-granularity locking scheme within SFS 115 that minimizes use ofSCSI primitives. As a first step in this scheme, one of servers 100assigns itself as a lock server by updating owner data field 304 ofmaster lock 302 with its own IP address. This assignment is accomplishedusing, for example, SCSI reservation primitives that allow one ofservers 100 to atomically interact with SFS 115. Upon assignment of alock server, however, communication between servers 100 and SFS 115 isfacilitated by the lock server, not the use of primitives. Morespecifically, any server that performs an IO operation on a fileresiding in SFS 115 reads owner data field 304 of master lock 302 atleast once to determine the IP address of the lock server, and issues alock request to the lock server whenever it needs to perform an IOoperation on a file residing in SFS 115. The lock server receives thelock request through LAN 103 and responds by indicating that the file iseither now successfully locked for use by the requesting server—which isaccomplished by inserting, into the master lock 302, informationassociated with the requesting server—or, that the file is currentlylocked by another server. Assuming that the request to lock the file isgranted, the requesting server may subsequently proceed with the IOoperation. When the IO operation has completed, and if no additional IOoperations are to be performed to the file, the requesting servernotifies the lock server and releases the lock to the file. In the eventthat the lock server becomes inaccessible to servers 100 due to, forexample, the lock server experiencing a failure, or a network partition,servers 100 take corrective actions which is further described below.

FIG. 4 is a flow diagram of method steps 400 for designating a server asa lock server when no lock server is designated, according to one ormore embodiments of the present invention. Although the method steps aredescribed in conjunction with the systems of FIGS. 1-2, persons skilledin the art will understand that any system configured to perform themethod steps is within the scope of the invention.

Method 400 begins at step 402, where one of servers 100 receives arequest to perform an IO operation to a file of SFS 115. For example, anapplication executing within server 100 _(A) may issue a write requestto update a file of SFS 115. At step 404, server 100 _(A) determinesthat no server IP address is set in owner data field 304 of master lock302. For example, if LAN 103 implements TCP/IP protocol, an owner datafield 304 that stores a value of “0.0.0.0” would indicate that none ofservers 100 have been designated as a lock server. At step 406, server100 _(A) obtains exclusive access to master lock 302. Server 100 _(A)may obtain this exclusive access by, for example, issuing a SCSI-2reserve command to SFS 115. Details of this technique may be found inU.S. patent application Ser. No. 11/676,109. Upon getting the exclusiveaccess, at step 408, server 100 _(A) updates owner data field 304 ofmaster lock 302 to an IP address associated with server 100 _(A) and,optionally, the heartbeat address data field 306 with data associatedwith server 100 _(A). Next, at step 410, server 100 _(A) sets the fileas locked by updating a lock 352 that corresponds to the file. Then, atstep 412, server 100 _(A) executes the IO operation to the file of SFS115. After server 100 _(A) has been successfully designated as the lockserver, server 100 _(A) is required to update its heartbeat eitherdirectly in heartbeat data field 306 or in a heartbeat region referencedby a pointer stored in heartbeat data field 306 at a timed interval sothat the other servers can determine that the lock server is active.

FIGS. 5A-5B is a flow diagram of method steps 500 for performing an IOoperation to a file when a lock server has been designated, according toone or more embodiments of the present invention. Although the methodsteps are described in conjunction with the systems of FIGS. 1-2,persons skilled in the art will understand that any system configured toperform the method steps is within the scope of the invention.

Method 500 begins at step 502, where one of servers 100, e.g., server100 _(B), receives a request to perform an IO operation on a file of SFS115, similar to step 402 in FIG. 4 described above. At step 504, server100 _(B) determines that owner data field 304 of master lock 302includes a non-zero entry; however, the server 100 _(E) may optionallyuse a cache to optimize the performance of this determining step. Forexample, owner data field 304 may store a value of “192.168.1.1” tosignify an IP address of server 100 _(A) indicating that server 100 _(A)has been designated as the lock server. At step 506, server 100 _(B)directs, to the lock server that is identified in owner data field 304,a request to obtain a lock 352 that is associated with the file uponwhich the IO operation is to be performed. In one embodiment, server 100_(E) generates a TCP/IP message directed to the IP address stored inowner data field 304, where the payload of the TCP/IP message includesinformation associated with the file. Accordingly, this TCP/IP messageis transmitted from server 100 _(B) to the lock server through LAN 103.At step 508, server 100 _(E) receives, from the lock server, a responseto the request to obtain the lock to the file through LAN 103. At step510, server 100 _(E) determines whether the response indicates that thefile is locked by another server. If, at step 510, server 100 _(E)determines that the response indicates that the file is locked byanother server, step 506 is repeated, or a “lock not free” message isreturned to upper layers, e.g. an application requesting the lock 352.Otherwise, the lock server inserts information associated with server100 _(E) (e.g., the IP address of server 100 _(B)) into owner data field354 and heartbeat field 356 of the lock associated with the file andmethod 500 proceeds to step 512. At step 512, server 100 _(E) performsthe IO operation on the file. At step 514, server 100 _(B) receives arequest to unlock the file. Optionally, server 100 _(B) may beconfigured to reference a cache to determine whether a connection to thelock server or file is in an active state. At step 516, server 100 _(E)routes the unlock request to the lock server. At step 518, server 100_(B) receives, from the lock server, a response to the unlock request.At step 520, server 100 _(B) determines whether the response indicatesthat the file has been successfully unlocked by the server.Subsequently, step 522 or step 524 follows, depending on whether thefile was successfully unlocked by the server.

FIG. 6 is a flow diagram of method steps 600 carried out by a lockserver to issue locks to servers 100 that are submitting lock requests,according to one or more embodiments of the present invention. Althoughthe method steps are described in conjunction with the systems of FIGS.1-2, persons skilled in the art will understand that any systemconfigured to perform the method steps is within the scope of theinvention.

Method 600 begins at step 602, where a lock server, e.g., server 100_(A), receives a request to obtain a lock to a file of SFS 115. Forexample, an application executing on server 100 _(C)—a server not actingas the lock server—may request the lock so that it can perform IOoperations on a file of SFS 115. At step 604, the lock server determineswhether the file is currently locked. To make this determination, thelock server reads the lock 352 associated with the file of SFS 115 andparses owner field 354 of the lock 352. In one embodiment, the lockserver includes a cache that stores information that the lock server haspreviously read from or written to locks to files of SFS 115. If, atstep 604, the lock server determines that the file is not currentlylocked, then the method proceeds to step 606. At step 606, the lockserver writes information associated with server 100 _(C) into ownerdata field 354 and heartbeat field 356. At step 608, the lock servernotifies server 100 _(C) through LAN 103 that the file is now locked foruse by server 100 _(C).

Referring now back to step 604, if the lock server determines that thefile is locked by another server not acting as the lock server, e.g.,server 100 _(B), then the method proceeds to step 610, where the lockserver notifies server 100 _(C) through LAN 103 that the file iscurrently locked by another server. Subsequently, server 100 _(C)repeatedly submits a lock request to the lock server over LAN 103 untilthe lock to the file becomes available, or a “lock not free” message isreturned to upper layers.

FIG. 7 is a flow diagram of method steps 700 for determining andresponding to a lock server failure or a network partition, according toone or more embodiments of the present invention. Although the methodsteps are described in conjunction with the systems of FIGS. 1-2,persons skilled in the art will understand that any system configured toperform the method steps is within the scope of the invention.

Method 700 begins at step 702, where a server 100, e.g., server 100_(B), submits a request to lock a file residing within SFS 115. Aspreviously described, such a request is directed to the lock serverthrough LAN 103. At step 704, server 100 _(B) determines that therequest has timed-out, e.g., if the lock server fails to respond to thelock request within a predetermined threshold. At step 706, server 100_(B) examines the heartbeat of the lock server, either by examining theheartbeat data field 306 of master lock 302 or heartbeat regionreferenced by a pointer stored in heartbeat data field 306 of masterlock 302, as the case may be. If, at step 706, server 100 _(A)determines that the heartbeat of the lock server is up-to-date, thenmethod 700 proceeds to step 708.

At step 708, server 100 _(B) determines that it is experiencing networkpartition, because it is unable to communicate with the lock server overLAN 103 although the heartbeat of the lock server is up-to-date. Such apartition may occur, for example, if a network card or a network cablefails to operate properly. Under this condition, servers 100 need torevert back to a non-network based lock management technique, such asthose based on SCSI-2 reserve/release primitives, as described in detailin U.S. patent application Ser. No. 11/676,109. To initiate thereversion process, the server experiencing the network partition (server100 _(E) in this example), at step 710, inserts a message, using e.g.SCSI-based primitives, in mailbox 312 of master lock 302 to request torevert to the non-network based lock management technique. At step 712,server 100 _(B) waits for confirmation/acknowledgement that the lockserver has received this message. Similarly, the locker server canconfirm/acknowledge the receipt of the message by using SCSI-basedreservations to write to mailbox 312. Additionally, and while the server100 _(E) waits, the lock server reverts to performing SCSI-basedreservations to grant locks on behalf of the servers 100 that have notyet detected the network partition. Upon suchconfirmation/acknowledgement, at step 714, server 100 _(E) performs anyIO operations using the non-network based lock management technique(e.g., by issuing SCSI primitives).

Referring back now to step 706, if server 100 _(B) determines that theheartbeat of the lock server is not up-to-date, then method 700 proceedsto step 716. At step 716, it is determined that the lock server hasfailed and server 100 _(B) carries out, at step 718, an attempt todesignate itself as the lock server. At step 718, server 100 _(B)attempts to override master lock 302 and update owner data field 304with information that is associated with server 100 _(B). Failure ofthis attempt indicates that another server 100 has already assigneditself as the lock server. However, if the master lock 302 isoverridden, and the owner data field 304 is updated, the server 100 _(B)has successfully assigned itself as the lock server, and the methodends. As shown in step 720, any subsequent lock requests are directed toserver 100 _(B) over LAN 103. Details of reclaiming ownership of a lockserver can be found in U.S. patent application Ser. No. 11/676,109,which discloses a technique that prevents two or more servers fromdesignating themselves as the new lock server. In addition, the steps702-704, as described above, are repeated to determine whether a networkpartition persists.

The embodiments described herein employ a single lock server formanaging locks in an SFS for a group of hosts. In alternativeembodiments, multiple lock servers, each managing locks for a subset offiles on the SFS may be used. The various embodiments described hereinmay employ various computer-implemented operations involving data storedin computer systems. For example, these operations may require physicalmanipulation of physical quantities—usually, though not necessarily,these quantities may take the form of electrical or magnetic signals,where they or representations of them are capable of being stored,transferred, combined, compared, or otherwise manipulated. Further, suchmanipulations are often referred to in terms, such as producing,identifying, determining, or comparing. Any operations described hereinthat form part of one or more embodiments of the invention may be usefulmachine operations. In addition, one or more embodiments of theinvention also relate to a device or an apparatus for performing theseoperations. The apparatus may be specially constructed for specificrequired purposes, or it may be a general purpose computer selectivelyactivated or configured by a computer program stored in the computer. Inparticular, various general purpose machines may be used with computerprograms written in accordance with the teachings herein, or it may bemore convenient to construct a more specialized apparatus to perform therequired operations.

The various embodiments described herein may be practiced with othercomputer system configurations including hand-held devices,microprocessor systems, microprocessor-based or programmable consumerelectronics, minicomputers, mainframe computers, and the like.

One or more embodiments of the present invention may be implemented asone or more computer programs or as one or more computer program modulesembodied in one or more computer readable media. The term computerreadable medium refers to any data storage device that can store datawhich can thereafter be input to a computer system—computer readablemedia may be based on any existing or subsequently developed technologyfor embodying computer programs in a manner that enables them to be readby a computer. Examples of a computer readable medium include a harddrive, network attached storage (NAS), read-only memory, random-accessmemory (e.g., a flash memory device), a CD (Compact Discs)—CD-ROM, aCD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, andother optical and non-optical data storage devices. The computerreadable medium can also be distributed over a network coupled computersystem so that the computer readable code is stored and executed in adistributed fashion.

Although one or more embodiments of the present invention have beendescribed in some detail for clarity of understanding, it will beapparent that certain changes and modifications may be made within thescope of the claims. Accordingly, the described embodiments are to beconsidered as illustrative and not restrictive, and the scope of theclaims is not to be limited to details given herein, but may be modifiedwithin the scope and equivalents of the claims. In the claims, elementsand/or steps do not imply any particular order of operation, unlessexplicitly stated in the claims.

Virtualization systems in accordance with the various embodiments may beimplemented as hosted embodiments, non-hosted embodiments or asembodiments that tend to blur distinctions between the two, are allenvisioned. Furthermore, various virtualization operations may be whollyor partially implemented in hardware. For example, a hardwareimplementation may employ a look-up table for modification of storageaccess requests to secure non-disk data.

Many variations, modifications, additions, and improvements arepossible, regardless the degree of virtualization. The virtualizationsoftware can therefore include components of a host, console, or guestoperating system that performs virtualization functions. Pluralinstances may be provided for components, operations or structuresdescribed herein as a single instance. Finally, boundaries betweenvarious components, operations and data stores are somewhat arbitrary,and particular operations are illustrated in the context of specificillustrative configurations. Other allocations of functionality areenvisioned and may fall within the scope of the invention(s). Ingeneral, structures and functionality presented as separate componentsin exemplary configurations may be implemented as a combined structureor component. Similarly, structures and functionality presented as asingle component may be implemented as separate components. These andother variations, modifications, additions, and improvements may fallwithin the scope of the appended claims(s).

What is claimed is:
 1. In a system including a group of servers thatcommunicate with each other over a first network and issue storagecommands to a shared data storage system over a second network, a methodof managing locks of files stored in the shared data storage systemusing a master lock that includes a data field for storing an address ofone of the servers in the group to identify such server as a currentserver for managing the locks of files stored in the shared data storagesystem, said method comprising: at a first server of the group,receiving a request to update a particular file stored in the shareddata storage system from an application executing within the firstserver; determining, by the first server, whether an address of anotherserver in the group is stored in the data field of the master lock; ifthe data field of the master lock does not contain an address of anotherserver in the group, updating the data field of the master lock to storean address of the first server and obtaining a lock to the particularfile; if the data field of the master lock does contain the address ofanother server in the group, communicating with said another server overthe first network using the address, a request to obtain the lock to theparticular file; and after the lock to the particular file is obtained,performing an input-output operation on the particular file to fulfillthe request to update the particular file.
 2. The method of claim 1,wherein the lock to the particular file includes a lock owner datafield, and a server ID of the first server is stored in the lock ownerdata field when the lock to the particular file is obtained by the firstserver.
 3. The method of claim 1, further comprising: rejecting, by saidanother server, the request to obtain the lock to the particular file ifthe lock to the particular file has been granted to a different serverin the group.
 4. The method of claim 3, further comprising: if therequest to obtain the lock to the particular file is rejected,communicating with said another server over the first network using theaddress, another request to obtain the lock to the particular file. 5.The method of claim 1, wherein the master lock and the locks of filesare stored in the shared data storage system.
 6. The method of claim 1,further comprising: determining by the first server that saidcommunicating with said another server over the first network hasfailed; and communicating with said another server through a messagedata field of the master lock to employ an alternative locking techniquethat does not rely on the first network.
 7. The method of claim 6,wherein said another server confirms through the message data field ofthe master lock the use of the alternative locking technique.
 8. Anon-transient computer readable medium comprising instructions that areto be executed in each of a plurality of servers in a group thatcommunicate with each other over a first network and issue storagecommands to a shared data storage system over a second network, whereinthe instructions when executed in the servers cause the servers to carryout a method of managing locks of files stored in the shared datastorage system using a master lock that includes a data field forstoring an address of one of the servers in the group to identify suchserver as a current server for managing the locks of files stored in theshared data storage system, said method comprising: at a first server ofthe group, receiving a request to update a particular file stored in theshared data storage system from an application executing within thefirst server; determining, by the first server, whether an address ofanother server in the group is stored in the data field of the masterlock; if the data field of the master lock does not contain an addressof another server in the group, updating the data field of the masterlock to store an address of the first server and obtaining a lock to theparticular file; if the data field of the master lock does contain theaddress of another server in the group, communicating with said anotherserver over the first network using the address, a request to obtain thelock to the particular file; and after the lock to the particular fileis obtained, performing an input-output operation on the particular fileto fulfill the request to update the particular file.
 9. Thenon-transient computer readable medium of claim 8, wherein the lock tothe particular file includes a lock owner data field, and a server ID ofthe first server is stored in the lock owner data field when the lock tothe particular file is obtained by the first server.
 10. Thenon-transient computer readable medium of claim 8, wherein the methodfurther comprises: rejecting, by said another server, the request toobtain the lock to the particular file if the lock to the particularfile has been granted to a different server in the group.
 11. Thenon-transient computer readable medium of claim 10, wherein the methodfurther comprises: if the request to obtain the lock to the particularfile is rejected, communicating with said another server over the firstnetwork using the address, another request to obtain the lock to theparticular file.
 12. The non-transient computer readable medium of claim8, wherein the master lock and the locks of files are stored in theshared data storage system.
 13. The non-transient computer readablemedium of claim 8, wherein the method further comprises: determining bythe first server that said communicating with said another server overthe first network has failed; and communicating with said another serverthrough a message data field of the master lock to employ an alternativelocking technique that does not rely on the first network.
 14. Thenon-transient computer readable medium of claim 13, wherein said anotherserver confirms through the message data field of the master lock theuse of the alternative locking technique.
 15. A system including a groupof servers that communicate with each other over a first network andissue storage commands to a shared data storage system over a secondnetwork, the data storage system having stored therein files, a masterlock, and locks of the files of the shared data storage system, themaster lock including a data field for storing an address of one of theservers in the group to identify such server as a current server formanaging the locks of the files of the shared data storage system,wherein each of the servers, in response to a request to update aparticular file of the shared data storage system from an applicationexecuting therein, is programmed to: determine whether an address ofanother server in the group is stored in the data field of the masterlock; if the data field of the master lock does not contain an addressof another server in the group, update the data field of the master lockto store an address of the first server and obtain a lock to theparticular file; if the data field of the master lock does contain theaddress of another server in the group, communicate with said anotherserver over the first network using the address, a request to obtain thelock to the particular file; and after the lock to the particular fileis obtained, perform an input-output operation on the particular file tofulfill the request to update the particular file.
 16. The system ofclaim 15, wherein the lock to the particular file includes a lock ownerdata field, and a server ID of the first server is stored in the lockowner data field when the lock to the particular file is obtained by thefirst server.
 17. The system of claim 15, wherein said another server isprogrammed to: reject the request to obtain the lock to the particularfile if the lock to the particular file has been granted to a differentserver in the group.
 18. The system of claim 17, wherein each of theservers is further programmed to: if the request to obtain the lock tothe particular file is rejected, communicate with said another serverover the first network using the address, another request to obtain thelock to the particular file.
 19. The system of claim 15, wherein each ofthe servers is further programmed to: determine that said communicatingwith said another server over the first network has failed; andcommunicate with said another server through a message data field of themaster lock to employ an alternative locking technique that does notrely on the first network.
 20. The system of claim 19, wherein saidanother server confirms through the message data field of the masterlock the use of the alternative locking technique.